File API Design Evaluation and Latency Budget
Let's discuss how our proposed design meets the non-functional requirements of a file service.
Introduction#
Now that the key choices for a file API service have been made, let's discuss how we meet the non-functional requirements and see situations where some tweaks or changes might be needed to make the service more efficient.
Non-functional requirements#
Let's discuss the non-functional requirements identified in the introduction lesson one by one.
Reliability#
We make shadow copies of service components (API gateways, application servers, UFMS, etc.) that can be subject to a single point of failure (SPOF). We also generate multiple copies of the data and store them in regionally distributed data centers as backups. This allows us to respond to any natural disaster. Moreover, we use circuit breakers and appropriate monitoring mechanisms to detect service-critical issues and resolve them in advance.
Security#
Our API allows authorized access to authenticated users only. Users can log in either by authenticating directly with their credentials or by using OAuth 2.0 and OIDC using an authorization code and PKCE flow to obtain a third-party access token. Access tokens reduce the risk of data leakage and loss when dealing with third-party applications. Moreover, the stored data is always encrypted, so the attacker can’t extract any valuable information from it, even if it is compromised. We assume our system also has an appropriate intrusion detection mechanism to identify and recover from bad situations.
Point to Ponder
Question
How do we ensure that uploaded data is not corrupted enroute to the service and also when downloading?
HTTP uses TCP as the underlying transport protocol, which performs checksums to identify and retransmit data lost over the wire. SSL/TLS also uses a more robust cryptographic hash algorithm (HMAC) rather than a simple TCP checksum. Additionally, we can hash the data using algorithms like MD5 and SHA256 and pass the value along with the data. After successfully uploading or downloading the file, the user can recalculate the hash value using the same algorithm. If the value matches the value sent, the data is not corrupted. Many hash functions on commodity servers are quite fast, but it still incurs an additional computational cost to perform the hash before and after uploading and downloading.
Scalability#
UFMS service decouples metadata from actual file content, allowing us to manage and scale users and data independently. Furthermore, using a storage scheme (capable of handling different structures and schema versions) allows us to efficiently store large amounts of data, along with its encryption details and compression algorithms. Furthermore, we assume that our storage is horizontally scalable, and we can increase storage capacity by increasing the number of back-end storage servers. We strictly follow the REST paradigm, which improves the reliability and stability of our API service in the long run.
Availability#
We use an API gateway to separate public endpoints (client facing) from private endpoints (development endpoints). We impose rate limits on incoming requests and quota limits for third-party consumer applications to reduce server load and prevent DoS attacks. We imply a maximum size limit for file uploads to avoid any service abuse. We also ensure that the most commonly used endpoints have no single point of failure.
Low latency#
We use temporary storage for fast uploads and CDN servers for fast downloads. We use streaming encryption and decryption mechanisms to reduce the delay caused by the encryption and decryption process. We also support data transfers over HTTP/2.0 connections for HTTP/2.0-enabled clients to further reduce latency.
A summary of approaches used to achieve the non-functional requirements is given in the table below:
Achieving Non-Functional Requirements
Non-Functional Requirements | Approaches |
Reliability |
|
Security |
|
Scalability |
|
Availability |
|
Low latency |
|
Latency budget#
Let's get a rough estimate of the response time of our file API for a hypothetical scenario where a professional photographer uploads and downloads a 10 MB raw image.
Note: As discussed in the Back-of-the-envelope Calculations for Latency, the latency of the
GETandPOSTrequests are affected by two different parameters. In the case ofGET, the average RTT remains the same regardless of the data size due to the small request size, and the time to download the response varies by 0.4 ms per KB. Similarly, forPOSTrequests, thetime changes with the data size by 1.15 ms per KB after the base RTT time, which was 260 ms.
Upload request#
Our file API sends a POST request to upload files to the temporary storage at the backend. We assume that this is a standard POST request, and the request body contains file content. Let's calculate the message size of the POST request.
Request and response size#
We know from the “Back-of-the-envelope Calculations for Latency” that a standard POST request is 2 KB, which includes headers and metadata such as fileId, ownerId, authToken, checksum, userList, and so on. By adding the size of the attachment, we get the overall request size by using the following formula:
We encrypt data using the AES algorithm before storing it in blob storage. For a 10 MB file, let's assume that the processing time of a standard POST request will increase by approximately 6.191 ms.
Point to Ponder
Question
Will file encryption and decryption affect the response time of the API?
Yes, encrypting and decrypting files before storing and sending them back to the user, respectively, results in increased processing time. This is a tradeoff between the response time and the sensitivity of the stored information. If the information is critical, we need to see how much latency we can tolerate to keep the data safe. Also, the major delay in response time is due to network transfers. Usually, servers process information much faster than the delays caused by network transfers. So, encrypting data with a faster algorithm might not be a big deal for the overall response time of an API.
Response time#
Keeping in mind what we learned above about request and response size, let's use the following calculator to find the estimated response time of the POST request:
Response Time Calculator for Uploading a File
| Enter size in KBs | 10242 | KB |
| Minimum latency | f12159.2 | ms |
| Maximum latency | f12240.2 | ms |
| Processing time | 10.191 | ms |
| Minimum response time | f12169.39 | ms |
| Maximum response time | f12250.39 | ms |
Note: You can enhance your understanding of the latency budget by changing the file size and processing time in the response time calculator above.
Assuming the request size is 10242 KBs:
Similarly, the response time is calculated as:
For processing time,
Download request#
Let's continue with the same example and download the file uploaded in the previous section.
Request and response size#
Our file API downloads files from the server using GET requests. Let's assume that the request size for a GET request is 1 KB (standard size). The file returned by the server is of size 10240 KB, as mentioned in the previous section, to calculate the response time for the GET request.
Response time#
Assuming that decrypting the file takes approximately 6.191 ms, let's add the request size in the calculator below to calculate the response time for downloading a 10 MB file.
Response Time Calculator for Downloading a 10 MB file
| Enter size in KBs | 10242 | KB |
| Minimum latency | f4287.3 | ms |
| Maximum latency | f4368.3 | ms |
| Processing time | 10.191 | ms |
| Minimum response time | f4297.49 | ms |
| Maximum response time | f4378.49 | ms |
Assuming that we are sending a standard GET to download a 10242 KBs file:
Similarly, the response time is calculated as follows:
For processing time,
A summary of the latency budget for upload and download requests is shown in the illustration below:
The response time for uploading or downloading a file depends on many factors, such as Internet speed, distance from the server, and file size. Considering that the client and server are located in different locations around the world, we can interpret the numbers above as a good average of the response times.
Optimizations and tradeoffs#
This section will discuss some interesting scenarios or variants of the API and what changes could be brought to optimize our service. Let's dive right in.
Large file upload: Uploading large files (say, 1 GB) takes a long time. It’s pretty likely that the upload will be interrupted, and we may have to reupload the entire file. HTTP/1.1 supports uploading files in byte ranges, which can be assembled after the upload is complete. If the upload is interrupted, we can simply send a HEAD request to know the last chunk received by the server. After that, we can send the next chunk and continue from where we left off.
Note: We are sending a
PUTrequest to upload the file because we don't want to create a new resource when resuming an interrupted request. Status code308 Resume Incompletemeans the file exists and we can continue uploading the remaining data instead of reuploading from scratch.
Point to Ponder
Question
Is there a way to use HTTP/1.1 to reduce latency for uploading large files?
Yes, we can reduce the latency of uploading large files by using the HTTP/1.1 range-header. This way, we can split the data into small chunks and transfer them by creating different connections in parallel, reducing the latency significantly. After uploading all the chunks, we reassemble the chunks into one object.
Note: When dealing with chunks of the same object, use the ETag (entity tag) value to identify different parts. Otherwise, we might get a flag of data corruption.
The graph above shows that we can achieve faster uploads by creating multiple TCP connections simultaneously. However, most browsers only allow six concurrent connections to the same host. Also, it depends on the available bandwidth from client to server. So, this hack may not work in all cases.
Multi-file upload: Although HTTP/1.1 supports HTTP pipelining, it can affect API performance when uploading or downloading multiple files, and requests for large files are made first, after which the remaining small files are delayed. This is also called HTTP content-blocking. Although there are workarounds, such as creating multiple connections to get independent responses, these are a waste of resources. Even if we enable chunked data upload for a large file, HTTP/1.1 may underperform in terms of multiplexing compared to HTTP/2.0. For such use cases, HTTP/2.0 may be a better choice.
Note: The performance difference between HTTP/2.0 and HTTP/1.1 is significant when transferring multiple small/large files over a single TCP connection.
Upload notification: We may send notifications to clients when an operation (upload, download, or delete) is complete. We can do this by adding a push notification service to our API workflow that uses the technique described in the Design a Pub-Sub service chapter.
Quiz
Question
What is the preferred HTTP version for APIs that upload large files on unstable networks?
HTTP/2.0 and earlier versions are connection oriented because the underlying TCP protocols and their performance can be affected by network disconnections. On the other hand, HTTP/3.0 works with QUIC, which takes advantage of the connectionless nature of UDP, making it suitable for unstable networks.
Note: QUIC operates on UPD, which is connectionless, but it is a connection-oriented protocol that applies its own algorithms at the transport layer to create connections.
API Model for File Service
Requirements of the Comment API